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Abstract 



In the paper, we show that quantum logic of hnear subspaces can be used for 
recognition of random signals by a Bayesian energy discriminant classifier. The en- 
r^ ' ergy distribution on linear subspaces is described by the correlation matrix of the 

^^1 probability distribution. We show that the correlation matrix corresponds to von 

Neumann density matrix in quantum theory. We suggest the interpretation of quan- 

»vj ! tum logic as a fuzzy logic of fuzzy sets. The use of quantum logic for recognition is 

^ ■ based on the fact that the probability distribution of each class lies approximately 

t^^ I in a lower-dimensional subspace of feature space. We offer the interpretation of dis- 

C^ ■ criminant functions as membership functions of fuzzy sets. Also we offer the quality 

functional for optimal choice of discriminant functions for recognition from some class 
of discriminant functions. 
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rS ■ 1 Introduction 



A Bayesian probabilistic discriminant classifier is based on a classical probability theory 
using algebra of subsets. Tlie decision rule of the probabilistic classifier maximizes the 
probability of "correct" recognition. A Bayesian energy discriminant classifier was briefly 
presented in [12]. The algebra of linear subspaces (quantum logic) is used instead of 
algebra of subsets. The decision rule of energy classifier maximizes the energy of "correct" 
recognition. The recognition of two classes is considered in detail. The use of quantum 
logic for recognition of signals is considered in [10]. 

The use of linear subspaces as class models is based on the assumption that the distri- 
bution of each class lies approximately in a lower-dimensional subspace of feature space. 
These spaces can be found by principal components analysis carried out individually on 
each class. An input vector from the unknown class is classified according to the greatest 
projection to the subspaces, each of which represents one class. 

The subspace classifier was suggested by Watanabe (method CLAFIC [3], [4]). This 
method, however, has drawbacks: a priori probabilities of classes are not used; subspaces of 
classes can overlap. T. Kohonen has offered the Learning Subspace Method (LSM) [2], [3]. 



During the training LSM decreases the number of vectors that are included in subspaces 
of different classes. The recognition of handwritten signs by the subspace classifier is 
considered in [4]. The subspace classifier is applied to phonemes recognition in [5] and to 
speaker recognition in [6]. 

Y.C. Eldar and A.V. Oppenheim [7] draw a parallel between quantum measurements 
and algorithms in signal processing. They propose to exploit the rich mathematical struc- 
ture of quantum theory in signal processing without realization of quantum processes. We 
suggest to consider energy processes instead of quantum processes because nature spends 
some energy to create any signal. 

2 Quantum logic as an example of fuzzy logic 

Let Hhea Hilbert space. A fuzzy set A of H is a set of ordered pairs A = {x, fiAi^) '■ ^ £ H} 
where ^a{x): H -^ {0, oo} is the membership function of the fuzzy set A. Suppose fiA{x) 
be non necessarily normal: sup;U^(x) 7^ l,x G H. A set of membership functions is a 
partially ordered set equipped with a partial order relation: ^a{x) < fJ-six) for all x £ H. 
The result of operations 

HAix) AfJ-Bix) =mf{fiA{x),fiB{x)), fiA{x) V fj.B{x) = sup{^iAix),HB{x)) 

is defined pointwise and the result is again a nonnegative function. Hence, the set of 
membership functions is a lattice. 

Each closed linear subspace M C H corresponds to an elementary logical proposition 
of quantum logic. Each linear subspace M has an orthogonal projection P/v/ onto M. So 
a proposition of quantum logic can be associated with the orthogonal projection. The set 
of all orthogonal projections is a lattice equipped with a partial order relation: P < R 
if {Px, x) < {Rx, x) for all x £ H. Hence every pair of projections P, R has a unique 
supremum (least upper bound) and a unique infimum (greatest lower bound): 

PAR = M{P,R), PV R = sup{P,R). 

Operations PAR, P V R, and P = I — P are conjunction, disjunction, and negation of 
quantum logic, respectively. 

Each projection Pm on the subspace M can be viewed as a filter [10] and it passes 
some energy /iA/(x) = {Pmx,x) = WPmxW^ of signal x (in quantum theory, a projection 
passes some quantum probability. This energy evaluates the value of membership of signal 
X to subspace M. So each linear subspace M C H can be associated with the fuzzy set: 

Am = {x,idM{x): xeH,McH}, where ^m{x) = {Pmx.,x). 

A set of all membership functions {iJiM{x),M G H} is a lattice equipped with a par- 
tial order relation: {Pmx, x) < {Pnx, x) for all x £ H. So operations supremum and 
infimum of that lattice can be used as a fuzzy logic conjunction and disjunction of fuzzy 
sets {Am,M S H}. a fuzzy logic negation of fuzzy set Am with membership function 
Ijlm{x) can be defined as a fuzzy set ^j\/± using the following membership function: 
fif^f±{x) = {Pj^,j±x,x) = {P^jX,x) = {{I — Pm)x,x), where a subspace M"*- is an or- 
thogonal complement of subspace M. Thus fuzzy sets {Am,M £ H} form a fuzzy logic. 



3 Discriminant functions as membership functions 

If an object of recognition is described as a vector x = (xi,xi, . . . ,x„), then the vector 
X is the pattern of the object in the feature space H = i?". A membership of object to 
some class Si, i = 1 .. .1, is an additional feature, which can be defined as the index i of 
the class, where i G / = {1, 2, . . . , /}. 

We use discriminant functions for the classifier of recognition. Discriminant functions 
are a set of functions gi{x), i = 1 . . .1, that determine the membership of the object with 
the pattern x to some class Si according to the following decision rule: if the object with 
the pattern x satisfies gi{x) > gj{x) for all j ^ i, then the object having the pattern x 
belongs to the class Si. 

Discriminant functions split the feature space H into disjoint sets: 

Ai = {x: gi{x) > gj{x),j = 1.../, j y^i}. 

Thus, if X G Ai, then the object having the pattern x belongs to the class Si. However, 
there are sets {x: gi{x) = gj[x), j ^i}, i = 1...1, whose elements it is impossible to 
include in some set Ai, i = 1 .. .1. Usually these sets are included in the sets Ai, i = 1 . . . I. 
Using discriminant functions, the classifier determines only a "likehood" value about 
the membership of the object with the pattern x to some class Si. So discriminant functions 
gi{x), i = 1 . . .1, are membership functions. In the following, we assume that discriminant 
functions are negative and non-necessarily normal: supgi{x) ^ 1, x £ H, i = 1 . . .1. 

4 Quality functional for a choice of optimal decision rule 

We shall use a probabilistic model for recognition. Let {fl,A,P) be a probability space 
where a sample space fi is a set of recognition objects. It is evident that the set of 
recognition classes Si, S2, ■ . . ,Si are a partition of O: Si + S2 + • • • + Si = i^ where 
Si n Sj = for all i / j. 

Following Zadeh [1], a fuzzy set A is called a fuzzy event if the corresponding mem- 
bership function /i^(a;): fi — > {0,oo} is ^-measurable. The probability of a fuzzy event is 
defined as 

P{A)=BfiA = J f^AdP. (1) 

u 

Suppose that an object to is described by the vector ^(lo) = {^i{io),^2i^), ■ ■ ■ 5?n(w)) 
of features where each S,i{uj). — > .ff, i = 1 . . . Z, is ^-measurable random variable. Since 
an object lj has the pattern x in the feature space H, there is a map S,{uj): Q ^ H. 
li uj G Si, then we can define an integer- valued random variable 7 such that j{uj) = i 
for all LO G Si, where i G I={^, 2, . . . , /}. The sample space Vt of the objects usually not 
accessible to immediate observation, therefore it is necessary to deal with the feature space 
H. However, VL can be identified with I x H. 

We use a Bayesian method which needs a priory probabilities pi = F{Si), i = 1 . . .1, 
and a conditional distributions ^Jii{A) = P(^ G A\Si),i = 1 . . .1. Since P(5'j) = P(7 = i), 
it follows that pi,i = 1 . . . /, is the probability distribution of the random variable 7. 



Let n{B, A) = P(7 & B,^ £ A) he a joint distribution of random variables 7, ^, where 
B = {ii,i2,. .. ,im} C I and A C B. We have fii{A) = P{^ £ A\Si), i = 1 .. .1. Since 

-S** = (7 = i), we get 

fii{i}, A) = P(7 = z,e G A) = P(^ G A\^ = z)P(7 = i) = P(^ G A|5,)P(5i) = p./^*.(A). 
Let us denote fJ.i{{i}) = Pi and ^i\{i, A) = ^i{A). We have 



Mi?,A) = P^(7 = i,)n(eG^) =^P(^GA|7 = ife)P(7 = ^fc) 
^fc=i ^ fc=i 

k=l "P^k k=l ^ 

It follows that ^\{i,A) = fJ,i{A) is the transition probability on / x S [11], where S is a 
cr-algebra of Borel subsets of feature space H = i?"". 

Discriminant functions gi{x), i = 1 . . .1, define a random variable 5'y(^) = 5(7, £,)■ Since 
uliijA) = Hi{A) is the transition probability on / x ^ [11], we have 

^9(1, = / Ati(di) / g{i, x)nl{i, dx) = ^Pi j giix)fii{dx). (2) 



H *=1 H 



Suppose H = Ai + A2 + ■ ■ ■ + Ai, where Ai, i = 1 . . .1, are disjoint sets. Let $ be a 
class of discriminant functions which contain only indicator functions: 

, . ^ , . jl iixGAi, 
qAx] = I4 (x) = < 
^^ ' '^ ' \0 -lix^Ai. 

It is evident that g'y(u){i{'^)) = fi'(7('^))C('^)) is the indicator function with a support: 

I I 

G = Y,{i e Ai)r^{^ = i) = Y,{i e Ai)f^S,. 

i=l i=l 

We can say that the indicator function Iq = 5(7, Cj is the membership function of 
"correct" recognition, where G is a crisp event of "correct" recognition. By (2), we have 

I „ I 

P(G) = E<7(7,0 = Y.P^J 9{i,x)^^^{dx) = Y.P{^ e A,\S,)P{S,). (3) 

i=l TT i=l 

A Bayesian probabilistic discriminant classifier splits the feature space H on disjoint 
sets H = Ai+A2 + --- + Ai such that the probability (3) for the crisp event G of "correct" 
recognition would be maximal. 

Let gi{x), i = 1 . . .l,he discriminant functions from some class $, where each function 
gi{x): H — > {0,00} is a Borel-measurable membership function of class Si. Then the 
random variable ^^(^(a;)), z = 1 . . . / on Q is a membership function such that the value 
gi{^{Lj)) is a membership degree of object a; to a class Si. We define a fuzzy event as 
follows: Gi = {co , gi{'y{ijj) , £,{u;)) : w G f^} for all i = 1 . . . /. 



Let us define the membership function: 



' 9i{i{^)) if ^ G Sj, 
^0 if w ^ 5". 



J- 



This membership function defines the fuzzy event SjGi = {uj, fj,j{i,uj): iv € il.}, which is 
an algebraic product [1] of events Gj and Si. The value fij{i,uj) is the membership degree 
of the object lo to the class Si if the statement u> G Sj is true. There can be two cases. 
First, if j = i, then Hi{i,uj) is the membership degree of the object a; to the class Si when 
the object uj belongs to its own class Si. We call the value Hi{i,uj) a "correct" degree of 
membership; we call the fuzzy event S'jGj a fuzzy event of "correct" recognition. Second, 
if j 7^ i, then fij{i,uj) is the membership degree of the object lv to the class Si when the 
object Lu belongs to other class Sj. We call the value fij{i,(jj), j / i, an "error" degree of 
membership; we call the fuzzy event SjGi, j y^ h Si fuzzy event of "error" recognition. 
Since I5. = ^(■y=i) for all i = 1 . . . I, we can define a membership function: 

I III 

«=1 j=l j=l i=l 

This membership function defines a degree of "correct" membership for all objects uj £ il.. 
We call the random variable 5(7, as a membership function of "correct" recognition and 
the fuzzy set G = {uJ,g{'~f{uj),S,{u:)): to £ ft} as a fuzzy event of "correct" recognition. 

It is natural to choose discriminant functions gi{x), i = 1 . . .1 from the class $ such 
that the probability of the fuzzy event G of "correct" recognition would be maximal. From 
(1) and (2), we have that the probability of the fuzzy event G is defined as 

P(G) = E5(7,e) = Y.Pi j 9{hx)ix^{dx). (4) 



i=l 



H 



Also (4) defines a quality functional for choice of discriminant functions from the class $. 
Let us show another interpretation of the quality functional (4) . We define 

Let us denote fii{{i}) = pi. Since Ig. = 1(7=^) and fi^i'^'^^) = /^«(^) is a transition 
probability on / x B, it follows that [11] 

E(l5,5^(e)) = E(i(^=,)5.(e)) 

= 11 l(fc=j)5'i(2;)^(d/c,dx) = / fii{dk)l^k=j) J gi{x)tJ-h{k,dx) 

IxH I H 

= Xl/^i(i^^})l{fc=i)y 9i{x)fJ-l{k,dx) =pj J gi{x)nj{dx). 

Then the probability of the fuzzy event SjGi = {^, ^Sj{^)giiC{^)) '■ w G f]} is defined as 
rj{i) = P{SjGi) = B{ls^gi{0) = p, j g^{x)^l,{dx). (5) 

H 



We call the value rj{i) a "correct" probability of recognition if i = j and an "error" prob- 
ability of recognition if i ^ j. The full sum of all the "correct" probability of recognition 
is defined as 

II I ^ r 

Y^nii) = ^P(5,G,) = Y. ^{isMO) = Y.P: j 9^{x)^l^{dx) = £5(7, = P(G). 

i=l j=l k=l i=l jT 

Let us define a conditional expectation of random variable relative to an event: 
n9^ms.) = ^%fll''^ ^iS,) = E(g,(0|S,)P(5,), where i = l...l. 
Then we get one more interpretation of the quality functional (4) : 

p(G) = Eg(7,e) = My. 1(7=.)5(7,o =Y.^{isMi)) = Y.^{9^ms,)ns^■ 

5 Basic formula 

We consider the features vector ^(w): f2 — > -ff as a random signal. Suppose /x is the 
probability distribution of the random signal ^. Let us define one linear form and two 
bilinear forms for the random signal ^ 

(m,y) = E(^,y) = / (a;,y)/i(dx), 
H 

{Ky,z) = B[{C,y){^,z)) = J {x,y){x,z)fi{dx), (6) 

H 

{Ry,z) = 'E(^{(,-m,y){^-m,z)j = {x - m,y)(x - m, z)/i(dx). (7) 

H 

A non-random signal m, operator K, and operator R are called a mathematical ex- 
pectation, correlation operator, and covariance operator, respectively. 

Prom (6) and (7), we have {Ky,z) = {Ry, z)+{m, y){m, z). Then (i?y, z)+{m,y){m, z) - 
{{R + Pm)y,z), where pmlJ = {y,m)m is a one-rank operator. It is evident that pmU = 
\\m\\'^Pmy, where fh = m/\\m\\ and pfhU = {y,m)fh is a one-dimensional projection. Then 

K = R + Prn = R+\\mfprn. (8) 

Let the signal x = ^{cv) be the pattern of the object lo. An affine structure of Hilbert 
space H is used when realizations of random signal is considered as points. Using a vector 
structure H, it is possible to interpret a value \\x\\'^ as a physical value, for example, as 
energy, power, or intensity. The value ||x|p is a measure of deviation of signal from the 
zero vector, and nature uses some energy for this deviation. In the following, let this value 
be energy. 

Let (A^,^) be a bilinear form, where A is a linear operator. Then 

B{AC, 0= f {Ax, x)fi{dx) = I {x, Ax)fi{dx) = tvKA = tiAK. (9) 

H H 



If P is an orthogonal projection, then {P$,,^) is the membership function. We can 
define a fuzzy event Ap = {uj, {P ^{lo) , ^{uj)) : uo G VL}. From (1) and (9), the probabihty 
of the fuzzy event Ap is defined as 

V{Ap) = B{PC, 0=1 {Px, x)fj.{dx) = ivPK = tvKP. 
H 

We now prove formula (9). Let {cj}, i = 1 . . . re, be an orthonormal basis in H. Using 
definitions of trace and correlation operator (6), we have 

n n . . n 

iiKA = ^{KAei,ei) = ^ {x,Aei){x,ei)iJ.{dx)= l^{A*x,{x,ei)ei)fj.{dx) 
= J{a*x,± {X, e.)e.) ,(dx) = / {A*x, x) .(dx) = J (x, Ax) ,(dx). 

H ^=^ H H 

Since the scalar product is symmetric in a real Hilbert space, (x,y) = (y,x), we get 
{Ax^x) = {x,Ax). Then 

n n n „ 

iiAK = Y,{AKei,e^) = J2{Kei,A*e^) =Y, {^,ei){x,A*ei)fi{dx) 

i=l i=l i=l TT 

= / ( 5Z (^' ^»)^' ^*ei V(dx) = / (x, ylx)^i(dx) = / (^x, x)^(dx) = E(A^, 0- 

H *=1 H H 

Statistical states of quantum system are described by von Neumann density matrix [8]. 
In fact, von Neumann density matrix is the correlation matrix of the discrete probability 
distribution. The formula (9) enables to describe statistical states of quantum system with 
continuous probability distributions. 

6 Recognition of two signal classes 

K. Helstrom was first who considered recognition of two classes in the quantum theory [8]. 
We apply Helstrom's result for recognition of two classes of random signals; we only 
consider an energy distribution instead of quantum probability distribution on projections. 

Assume that the object uj of recognition belongs to one of the classes Si, i = 1,2, 
and the pattern of object is the signal x = C(w). Suppose that each class Si, i = 1,2, 
is matched with the orthogonal projection Pi, i = 1,2, where Pi + P2 = I- Then the 
value {PiX,x) = {Pi^{u;),^{uj)) = gi{^{uj)) is the membership of object to to the class Si, 
i = 1,2. Therefore, the projections Pi, i = 1,2, define a class <I> of discriminant functions 
5i(x) = (PiX,x), i = l, 2. 

Let Pi = F{Si), i = 1,2 be a priori probabilities of classes and the conditional distri- 
butions fJ.i{A) = P(^ G A\Si), i = 1,2, have the correlation operators Ki, i = 1,2. We 
define a fuzzy event G = {LO,g{'y{io),^{uj)): co G ri}, where (7(7,^) = {P-y^jC)- By (4), we 
must maximize the probability of the fuzzy event G: 

P(G) = Eg{j, i)=pij (Pix, x)^i(dx) +P2J {P2X, x)^2(dx). (10) 

H H 



Let us suggest an energy interpretation of formula (10). Using (5) and (10), we have 
r,(i) = B{ls^ {P,C, 0) = Pj J {P^x, x)fi,{dx) = pjtiP.Kj. 

H 

Each projection Pj, i = 1, 2, passes same energy of signals x = ^{uj) from the own class 
Sj, i = j and the other class Sj, i j^ j. We call energy rj{i) a "correct" energy if i = j and 
an "error" energy if i ^ j- We also call a full "correct" energy, which passes projections 
of all classes, as an energy of "correct" recognition. This energy is defined as 

Enrc(Pi,P2) = ri(l) +r2(2) =pitrPiKi +p2trP2^2. (H) 

It is clear that we must find projections Pi,P2 so that the value Enrc(Pi,P2) would be 
the largest. In other words, projections Pi,P2 together must pass the energy of signals 
from their own classes as much as possible. 
Since P2 = I — Pi, we have 

Enrc(Pi, P2) = P2trK2 + trPi{piKi - P2K2). 

Here the first value is constant but the second value depends only on the projection Pi. 
Hence we must find the projection Pi such that the second value was the largest. Assume 
that Xi,i = 1 . . . n, are eigenvalues and yi, i = 1 . . . n, are the eigenvectors of the operator 
piKi — P2K2. Then 

n n 

trPiipiKi - P2K2) = ^ {Pi{piKi - P2K2)yi,yi) = ^ (Pi Xiyi,yi) 

n 

= ^Ai||Piyi||2= ^ Ai||Piyi|p+ ^ Ai||Piyi|p = di + d2, 

1=1 Ai>0 A,<0 

where ||Pi2/jp < ||yjp for all i = 1 . . .n, di > 0, (^2 < 0. Let Pi be a projection onto a 
subspace spanned by the eigenvectors with positive eigenvalues. Then ||Piyj|p = ||2/i|p if 
Aj > and ||Piyj|p = if Aj < 0. It follows that di will be the largest and d2 = 0. Hence 
the required projection Pi is found and P2 = I — Pi. 

Comment 1. It is possible to minimize the energy of "error" recognition. The energy of 
"error" recognition is the following sum: 

EnrE(Pi,P2) =Pin(2) +^2^2(1) = pitrPa J^i +p2trPiK2. 

If the projections Pi, P2 maximize the energy of "correct" recognition, then they must 
minimize energy of "error" recognition. Indeed, we have 

EnrE(Pi,P2) = pitr(P2Jfi)+p2tr(Piir2)=pitr(/-Pi)J^i+P2tr(/-P2)ir2 
= pitrJ^i + p2tr(K2) - PitrPiKi - p2ti{P2K2) 
= piiTKi+p2iT{K2)-BnTc{Pi,P2). (12) 

There the values piiiKi and p2iTK2 are constant. Hence the value EnrE(Pi,P2) will be 
the least if the value EnrR(Pi,P2) is the greatest. 

Comment 2. Prom (12) it follows that the sum energy of "correct" recognition and 
"error" recognition is a constant. Thus, increasing the energy of "correct" recognition, we 
decrease the energy of "error" recognition and vice versa. 



7 Decision rule for recognition 

Suppose there are two classes of objects Si, i = 1,2, and the signal x = ^{uj) is the pattern 
of the object to. If we use a probabilistic Bayesian classifier, then the feature space H is 
divided into the disjoint subsets: Li, L2, LiU L2 = H, where the subset Li correspond 
to the class ^i and the subset L2 corresponds to the class 82- The decision rule that 
determines unambiguously to which class Si or ^2 belongs the object co, is defined as 
follows: uj £ Si ii X £ Li and to £ S2 H x £ L2- 

However, the situation is different when quantum logic is used. Suppose each class Si, 
i = 1,2, is matched with the orthogonal projection Pi, i = 1,2, where P1 + P2 = I. Denote 
Li = PiH, L2 = P2H, where Li ® L2 = H. Then the pattern of the object x = .^(a;) 
can be a sum of two signals: x = Pix + P2X = xi + X2, where xi £ Li, X2 £ L2. It is 
natural to accept that oj £ Si li Pix = x and w € 52 if P2X = x. If xi 7^ and X2 / 0, 
then the pattern x belongs simultaneously to two subspaces: Li and L2. Hence we can 
not decide to which class belongs the object using subspaces of quantum logic. Therefore 
we must use discriminant functions gi{x) = {PiX,x),i = 1,2, which unambiguously gives 
the decision about the membership of the object to one of the classes: Si or S'2. By (11), 
we can find discriminant functions gi{x) = {Pix,x) and g2{x) = {P2X,x) such that they 
maximize the energy of "correct" recognition. Thus we have the following decision rule: 

uj £ Si if {Pix,x) > {P2X,x) and lo £ S2 otherwise. (13) 

When the decision rule (13) is applied, the feature space H is divided into disjoint sets: 
Ai = {x: {Pix,x) > {P2X,x)} and A2 = {x: {P2X,x) > {Pix,x)}. We put 

EnTciAi,A2) =pi {Pix,x)fj.i{dx)+p2 / {P2X,x)iJ,2idx). 
Ai A2 

It is evident that 

Enrc(Pi,P2) = Emc{Ai, A2) + pi J {Pix, x) fiiidx) + p2 J {P2X, x) fi2{dx). (14) 

A2 Ai 

The object lo of recognition is chosen in a random way but we hope that the value of 
the discriminant function gi{x) of class Si is maximal if statement lv £ Si is true. Also 
it is natural to hope that Enrc(-Pi, -P2) is approximately equal to Enrc(^i,^2)- Using 
{Pix,x) < {P2X,x) on G2 and {P2X,x) < {Pix,x) on Gi, we get 



Pl / {PlX,x)fll{dx) <Pl {P2X,x)fli{dx) <Pl {P2X,x)fli{dx) = PltTP2Ki, 
A2 A2 H 

P2 / {P2X,x)fl2{dx) <P2 {Pix,x)fi2{dx) < P2 {PlX, x) fl2{dx) = P2trPiK2. 



From (14) it follows that 

< Enrc(Pi, P2) - Enrc(^i, ^2) < Pii^PiKi +p2trPii^2 = EnrE(Pi, P2). (15) 

If projections Pi,P2 maximize the energy Enrc(Pi,-P2) of "correct" recognition, then 
from comment 1 it follows that projections Pi,P2 minimize the energy EnrE(-Pi,-P2) of 



"error" recognition. If we have good recognition with projections Pi,P2, then the value 
EnrE(-Pi, -P2) is smaU. Therefore from (15) it follows that Enrc(i^i, -P2) is approximately 
equal to Enrc(^i,^2)- 

Example 1. Suppose the object of recognition co belongs to one of the classes Si, i = 1,2. 
Assume that a priori probabilities of classes are equal Pi = P2 = 1/2; the conditional 
distributions Hi{A) = P(^ G A\Si), i = 1,2, have the identical covariance matrices equal 
to R and mathematical expectations mi, 1112 are orthogonal as vectors. 

We choose the orthonormal basis ej, i = l...n, m H such that ei = ?7ii/||7ni||, 
e„ = ?n2/||7TT.2||- We get from (8) that Ki = R+ \\mi\\'^pi, K2 = R+ ||"i2pP2) where pix = 
(x,ei)ei, P2X = (x,e„)e„. In the chosen basis, the matrix piKi — P2K2 = l/2{Ki — K2) 
is diagonal with eigenvalues ||77ii|p/2,0, . . . ,0, — ||r7i2|p/2. Then Pix = (x,?7ii)/||mi||, 
P2X = {x, m2) /\\m2\\- If X = ^{u) is the pattern of the object co, then by (13) we have the 
following decision rule: a; G 5i if {mi,x)'^ /\\mi\\'^ > {m2,x)'^ /\\m2\\'^ and lo £ S2 otherwise. 

8 Normalization by trace 

Suppose X = S,{uj) is the pattern of the object to and E((P^,^)|S'j) = ivPKi, i = 1,2, 
are conditional energy distributions on projections. The conditional energy distributions 
on projections of different classes are not equivalent if the trace of the correlation op- 
erators Ki, i = 1,2, are not equal. It is possible to normalize the conditional energy 
distribution on projections by normalizing the pattern of objects of each class as follows: 
r]i = S,/\/trKi, i = 1,2. Then the correlation operators will be normalized as follows: 
Ki= Ki/tiKi, K2= K2/tiK2, where tr^i = trA'2 = 1. Also it is necessary to normalize 
the object patterns x = S,{uj) in the decision rule (13). So, we have the following decision 
rule: uj £ Si if {Pix,x)/trKi > {P2X,x)/trK2 and uj £ S2 otherwise. 

Example 2. We consider a classical recognition task of two classes: the class ^i is a 
random signal ^ = a + ?/, where a is a non-random signal and i] is a white noise; the class 
52 is a white noise r]. Suppose pi = P2 = 1/2. 

The correlation matrix of white noise 77 is cr^I, where a"^ is a constant and / is an 
identity matrix. The mathematical expectations of the random signals of classes Si,i = 
1,2, are respectively mi = a, m,2 = 0. Applying the decision rule of example 1, the 
classifier always decide that all objects uj £ Si. 

We normalize the correlation matrices of both classes by their trace. From (8), we 
have Ki = a^I + ||a|ppa; where paX = {x,a)a, a = a/\\a\\; we also have K2 = a^I. Then 
trKi = (T^tr/+ ||a|ptrpa = ^<7^ + l|o|P and tri^2 = na"^. Since covariance matrices of both 
classes are a^I, they are diagonal in any basis. We choose the basis in H such that ei = a. 
Then the matrix piKi — P2K2 = l/2{Ki/tvKi — K2/ticK2) is diagonal in the chosen basis 
with following eigenvalues: 

(n-l)||a||2 ||a||2 ||a||2 



2n(n(T2 + ||aP) ' ' 2n(?i(T2 + ||aP) ' 2n(?i(T2 + ||aP) 

Here the first eigenvalue is positive and the last n — 1 eigenvalues are negative. So the pro- 
jection Pi is a one-dimensional projection: Pix = {x,ei)ei. Then {Pix,x) = {x , a)'^ / \\a\\ 
and {P2X,x) = {{I — Pi)x,x) = {x,x) — {x,a)'^/\\a\\ . By (9), the variance of the white 
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noise rj is equal to EHryp = E{ri,ri) = na"^. So the signal-to-noise ratio is defined as 
SNR= \\af/n(j'^. 

Normalizing the object pattern by the trace, we get from (13) the following decision 
rule: u; S Si if {x,a)'^/{l -|- SNR) > ||3;|p||a|p — (x,a)^ and lo £ S2 otherwise. 

We have trPai^i = o-^trPs = (n - l)a'^ and trPiETs = cr^trPi = ct^. Then 

EnrE(Pi,P2) = -^trP.K. + ^^trP.K^ = ^^V^ + ^ = /"'^\+-- 
^ ' ' tvKi tiK2 2(na2 + ||a||2) 2nfT2 2(1 + SNR) 2n 

Thus the energy of "error" recognition is small if the SNR and the dimension n of the 
feature space H are large. 

9 Normalization by signal norm 

We can to normalize object pattern by normalizing each signal x = ^{uj) as vector by 
its norm. In that case, ends of normalized random vectors are located on a unit sphere. 
Suppose P(^ = O) = 0. Putting rj = S,/\\^\\, we have 

E(,M?) = E((e,0/llein=E(||e||Vllef)=l. (16) 

Let K be the correlation operator of the normalized random signal 77. From (9) and 
(16), we have tiK = 1. Hence, the energy distribution on projections is normalized. 

If objects patterns of are normalized as x = ^(u;)/||^(u;)||, then gi{x) = {PiX,x) < 1, 
i = 1,2. This yields that sup gi{x) = 1, where z = 1,2. So the discriminant functions 
gi{x),i = 1,2 are classical membership functions [1]. 

Vectors x and Xx for any A> describe the same physical state in quantum mechanics. 
It means that states of quantum systems are rays, i.e. points of projective space. Due this 
fact, we can consider states with unit norm ||j;|| = 1 only. 

The same holds for sound signals and monochrome images. In fact, the sound signals 
X and Ax for any A> differ in loudness only. The monochrome images can be described 
as a set of I = nm real numbers corresponding to the intensity of the light in each pixel. 
Hence the space of the monochrome images can be described as a vector space of dimension 
/ = nm. All the intensities of the monochrome image can be multiplied by a number A> 0, 
but that does not change monochrome image. 

10 Subtraction of mean 

The following hypothesis is accepted in the recognition theory: the distribution of the 
patterns of a class is concentrated in a compact area of feature space. It is natural to 
assume that distribution of patterns is grouped around the mean (mathematical expecta- 
tion) of this distribution. Then each object pattern x = ^{oj) can be written as the sum 
X = y + a, where a is the mean and y is the random vector from the compact area such 
that its beginning is the end of the mean a. 

On the other hand, linear subspaces that correspond to classes in feature space are 
intersect at the zero point of the space H (the origin of the coordinates). Therefore if 
quantum logic is used for recognition, then it is natural to combine compact areas with 
the origin of coordinates. 
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In this case, the energy distributions on projections are described by the covariance 
operators. 

Suppose the conditional distributions Hi{A) = P{^ £ A\Si), i = 1,2, have the covari- 
ance operators Ri,R2 and means mi,m2- Then it is necessary to find projections Pi,P2 
such that the value of energy EnrR(Pi, P2) = pitiPiRi +p2tr:P2R2 would be the maximal. 
After subtracting from object patterns x = ^(w) their means, we get from (13) the fol- 
lowing decision rule: w G 5i if {Pi{x — mi),x — mi) > {P2{x — m2),x — 1712) and oo £ S2 
otherwise. 
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